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a new solution to an old problem using a novel expan- 
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Summary. Good's formula and Fisher's method are frequently used for combining indepen- 
dent P-values. Interestingly, the equivalent of Good's formula already emerged in 1910 and 
mathematical expressions relevant to even more general situations have been repeatedly de- 
rived, albeit in different context. We provide here a novel derivation and show how the analytic 
formula obtained reduces to the two aforementioned ones as special cases. The main novelty 
of this paper, however, is the explicit treatment of nearly degenerate weights, which are known 
to cause numerical instabilities. We derive a controlled expansion, in powers of differences in 
inverse weights, that provides both accurate statistics and stable numerics. 

Keywords: combine P-values; nearly degenerate weights; exponential variable; gamma vari- 
able; gamma distribution; Erlang distribution 



• • i. Introduction 

> ■ 

k> The question of how to obtain an overall signific ance leve l for the results of independent runs 
of studies has been investigated since the 1930s (jTippett . 1931 ; Fisher . 1932 ; Pearson, 19331 



d 119381 ). In fact, forming a single statistical significance out of multiple independent tests has 
been an imp ortant subject of study in numerous area of scie ntific disciplines, in cluding socia l 
psycholo gy (Stouffer et all fl 949: Mo steller and Bush . 19541). me dical researc h (l01kinLll995l) . 
genet ics ( Loesgen et all l200ll). proteomics (lAlves et all 20081). genomics (jrless and Iverl . 
l2007t) . bioinformatics (|Bailev and Gribskovl 1 19981: 1 Yu et all 120061 ) and others. 

Frequently used methods for combining independent P-values fall into numeric and an- 
alytic categories. This classification is not totally precise since method such as Fisher's 
started out with the necessity of i nverting the y 2 cumulative distribution fun ction and thus 
seem ed like a numerical approach ( Pearsonl lT9381. The method mentioned in ( Bahrucha-ReidL 
Il960l ). although not in the context of combining P-values, brought out an analytical expres- 
sion for combined P-value using Fisher's method, thus effectively brought Fisher's method 
into analytic category. In the context of combining P-values, by mapping to a known result 
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by Feller (jFellerl . Il966h . iBailev and Gribskovl (|l998h also provided an analytic formula for 
Fisher's combined P-value. Numerical approaches typicall y involve inverting c umulative 
distribution functions . For example, Sto uffer's z-methods Jstouffer et all Il949h. whether 
unweighted (jWhitlockl . 120051 ) or weighted (|LiptaM . ll958tlKoziol and Tuckwelll.ll994l). requ ire 
inverting the error function. Lancaster's generalization ( Lancasteri Il96ll : iKoziol . 19961) of 
Fisher's formalism also requires inverting gamma distribution function to incorporate un- 
equal weighting for P-values combined. 

In th is paper, we focus on analytic methods only. Two existing analytic approaches , 
Fisher's dFisherl. Il932t: iBahrucha-Reidl Il960i: IBailev and Gribskovl Il998t lAlves et all 120081) 
and Good's (jGoodJl955llLikeslll967l ) , are frequently employed. Fisher's method combines 
L independent tail-area probabilities democratically to form a single significance assign- 
ment while Good's formula weights each tail-area probability differently to form a different 
single significance assignment. Since Good's expression involves, in the denominator, pair- 
wise differences between weights, he cautiously remarked that the expression may become 
ill-conditioned and thus the calculations should be done by holding more decimal places 
when we ights of sim i lar magnitudes exist. This s t ateme nt has been paraphrased by many 
authors (jBhoil Il992t lOlkin and Sanen . 120011 iHoul . l2005h . 

In addition to the cases considered by Fisher and Good, it is foreseeable that one may 
wish to categorize obtained independent P-values into different groups so that one would 
like to weight P-values within the same group democratically and weight different group 
differently. We will call this scenario the general case (GC). The GC naturally occurs since 
one may wish to categorize data obtained from the same type of experimental instruments 
into the same group, and data collected from different instrument types may justify the 
use of different weights. When there is only one instrument type, the GC reduces to 
Fisher's consideration. When there exist no replicates within each instrument types, the 
GC coincides with the consideration of Good. In principle, the weighted version of the 
Stouffer's method can also be used for this purpose. Since the main scope here is to pursue 
analytic results, we won't delve into methods in numerical category. 

It is important, however, to point out that the mathematical problem of combining P- 
values is also related to other areas of research. For example, the equivalent of Good's for- 



mula had already emerged in 1910 in the context of sequential radioactive decay (jBateman 



1910), while analytical expression for Fisher's combined P-value emerg ed in 1960 as a 
speci al case of the former w hen all the decay constants are identical ([Bahrucha- Reidl . 
1960 j). After Good's formula (IGoodl . Il955l) . the same distribution function was rederived 
bv iMcGill and Gibbon! (jl965l ) and later on by iLiked (jl967l) . As for the GC, Fisher's 
method included, the mathematical equivalents appear in different areas of studies mainly 
under the consideration of sum of exponential/gamma variables. The distribution func- 
tion of linear combination of exponential/gamma variables are useful in various fields. 
When limited to exponential variab les, it r esults in the Erlang distribution that is of- 
ten enc ountered in queuing theory ( Morsel Il958 ). It is also connected to the renewal 
theory ( Cox . 19621). time series problem (| MacNeill , 1974 ). and can be applied to model 
reliability ( Jasiulewicz and Kordeckl 12003 ). The intimate connections between these seem- 
ingly different problems are not obvious at first glance. Consequently, it is not surprising 
that the distribution function has been rediscovered/rederived many times and that some 
information about it has not been widely circulated. Our literature searches show that 
the first explicit result (withou t further derivatives involved) for the distribution function 
was o b tained by Mathai ( 19831) . Subse quently, motivated by different contexts, Harrison 
( 199dl ), Amari and Mirsa (1997), and Jasiulewicz and Kordecki (2003) all rederived the 
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same distribution function. 

Employing a complex variable integral formulation, we are able to provide a different 
derivation of the distribution function and become the first to make connection to the GC of 
combining P-values. Since both Fisher's and Good's considerations arise as special limiting 
cases of the GC, we also illustrate that our cumulative probability distribution for GC indeed 
reduces to the appropriate limiting formulas upon taking appropriate parameters. The main 
novelty of this paper, however, is the explicit treatment of cases where nearly degenerate 
weights exist. These cases are known for nu merical instabilities, which motivated many 
authors to pursu e unco ntrolled approximations ( Solomon and Stephens! . 1977tlGabler . 1987t 
Bhoi , 1992 ; Houl . 120051 ). We have derived a controlled expansion, in power of differences in 



inverse weights, that provides both accurate statistics and stable numerics. 

In the following section, we will first summarize Fisher's and Good's methods for com- 
bining P-values, followed by the mathematical definition of the GC. A section devoted to 
derivation of the probability distribution function and cumulative probability for the GC 
then follows. We then delve into the case of nearly degenerate weights and provides a for- 
mula with controllable accuracy for combining P-values. A few examples of using the main 
results are then provided. This paper concludes with future directions. A CH — h program, 
which computes the combined P-values with equal numerical stability regardless of whether 
weights are (nearly) degenerate or not, is available upon request from the authors. 



2. Summary of Fisher's and Good's methods for combining P-values 

Imagine that one wishes to combine L independent P-values Pi,p2, ■ ■ ■ ,Pl, each of which 
is drawn from an uniform, independent distribution over (0,1]. For later convenience, let 
us define 

T F = P1-P2-- -PL , (1) 

r G = PT-PT---PT- (2) 

To form a unified significance, Fisher and Good considered respectively the stochastic quan- 
tities Qf and Qg, defined by 

Q F = xi ■ x 2 ■ ■ ■ x L , (3) 

Qg = ■<■■[ ■■<Y ■ (4) 

where each Xi represents a random variable drawn from an uniform, independent distribu- 
tion over (0, 1]. The following probabilities 



L-l 



Prob(Q F <r F ) = t f J2 [H1/ ' f)] (5) 

L 

Prob(Q G <r G ) = J2 AiTg1M ( 6 ) 



1=1 



provide the unified statistical significances, corresponding respectively to Fisher's and Good's 
considerations, from combining L independent P-values. In eq. ((BJ, the prefactor A; is given 

by 
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Apparently, A; is ill-defined when the weight wi coincides with or is numerically close to 
any other weights Wk- Although Fisher did not derive ([5]), from this point on, we shall refer 
to (O as Fisher's formula and (|6]) as the Good's formula. 



3. General case including Fisher's and Good's formulas 

Let us divide the L P- values into m groups with 1 < m < L. Within each group fc, 
we weight the nk P-values within equally; while P-values in different groups are weighted 
differently. Therefore, when m = L and nk = 1 V fc, we have the Good's case and when 
m = 1 and n\ = L, we reach Fisher case. We will therefore define the following quantities 
of interest 



r - n 

fc=l 

m 

q - n 



k=l 



n 



(8) 
(9) 



where each Xk-j represents again a random variable drawn from an uniform, independent 
distribution over (0,1]. The quantity of interest Prob(Q < t), if obtained, should cover 
both results of Fisher's and Good's as the limiting cases. In the next section, we will derive 
an exact expression for Prob(Q < r) and describe how to recover the results of Fisher's and 
Good's. 



4. Derivation of Prob(Q < r) 

Let F(t) = Prob(Q < r), we then may write 

el pi / m 



F(t) 



n 

k=l 



.1=1 



m n k 

n n dxk -<j 

fe=ij=i 



(10) 



where the heaviside step function 0(x) takes value 1 when x > and value when x < 0. 
Upon taking a derivative with respect to r, we obtain 



f(r) 



_ dF(i 



dr 



n 

k=l 



n xfc y 



II II'/' w (11) 

fc=l j=l 



where S(x) represents Dirac's delta function that takes zero value everywhere except at 
x = and that V a > 0, /" S(x)dx = 1. 

To proceed, let us make the following change of variables 



and remember that if yo is the only root of / (f(yo) = 0) 

% - yo) 



s(f(v)) 



\f'(yo)\ 1 
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we may then rewrite (fTTj) as 
f(r) = 



1 m 






X! uk >j 


y fc=i 


3=1 



n n duk -3 

fc=l3=l 



(12) 



Note that the right hand side of (|T2"j) . except for the additional factor e', is the probability 
density function of a weighted, linear sum of exponential variables. 
By introducing the integral representation of the 5 function 



1 f°° 



■iq(t—c) 



we may re-express ffT^)) as 
f(r) = 



n 

i=i 

m 

n 

i=i 



dq 
2^ 


e -it(q+i 


m 

n 

fe=i 


" />oo 

/ e"V 
Jo 




e -it(q+i 


m 

n 

fe=i 




1 


2^ 




1 - 


- iqw k _ 


2^ 


e -it{q+i 


m 

n( 


' i 


\ni m 

n 






! "It 


p OC 

— oo 


2^ 



9 + irk 

m 

n 

fc=i 



i 



q + ir k 



/(r;ni,n 2 ,...,n m ) 



(13) 
(14) 



where r k = 1 /u>fc is introduced for the ease of analytical manipulation and / is introduced 
for later convenience. Since all w k > 0, implying that all r k > and thus the poles of the 
integrand in (fl3l) lie completely at the lower half of the q-plane. The integral of q may be 
extended to enclose the lower half q-plane to result in 



fir) 



1=1 

m 

Ii(in) ni 



-2wi 



E 



[d/dqf 



- it q 



n 



1 



q + irj 



q=-ir k 



1 = 1 



n 

i=i 



(-l)^-l(it)^ 7 r ;„, 1 • , h )\ 



n 



E E 

fc=l I 31,32, ••• ) flm=0 



(t)« 



9% ! 



n 



3 = l,j¥=k 



( nj -l+ gj )\ (-1) 

(n 5 -l)!ft! (/•., !■>,)" ■ ■ 



Comparing eq. (ITS1) with eq. (fT2"1) . we see that the right hand side of (IT5|) is composed 
of the product of the factor e and the probability density function of a weighted sum 
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of exponential variables. In fact, the explicit expre ssion of l atter , in addition to the new 
derivation presented here, was derived much earli er ([Mathail . 1983) under different context 
and was rediscovered/rederived multiple times ( Harrison! . Il990t lAmari and Mirsal . Il997 : 



Jasiulewicz and Kordec ki. 2003) by different means. Its connection to combining P-values, 



however, was never made explicit till now. 

From (fit))) , we know that F(t — 0) = 0, implying that 



nT />OG 

F(t) = / f(r')dr' = / /(e" 4 ') 
Jo Jt 



(=1 J k=l 91,92, ...,g m =0 \j=l,j^k 
J2 gi=n k -l 



n 

i=i 

m 

n 

1=1 



E E 



E E 



n 

m 

n 



( nj -i+ gj )\ (-1)^ 

(rij-l)!^! (rj-rfc) n i+» 
fa- -! + <&)! (-1)* 



00 (t^s* 



E j+i 



fc=l gi,92, ...,g m =0 \j=X,j^k 

£ 9i=«fc-l 



E E 



fc=l 91,92, ...,g m =0 \J = l,J#fc 
X)9i=™fc-1 



tt K -l + g,)! (-r fc )^rf | ,y. 



m rife— 1 rife — 1 — 9fc 

EE E i n 

fe=l 9fc=0 9Mfc=° \J=lj3#* 



+ (-r fc )*^ 



(5fe - 0! 



fjj 



H{r k t, g k ) 



where the function H is defined as 

H (x, n) 



n ^ 

EX 

fc=0 



(17) 



Eq. ([TBI represents the most general formula that interpolates the scenarios considered by 
Fisher and Good. 

Let us take the limiting case from (TI7J)) . For Fisher's formula, one weights every P- value 
equally, and thus correspond to m = 1 and n\ — L. The constraint in the sum of (fl6|) forces 
<?i = ni — 1 = L — 1. Consequently, we have (by calling n by r for simplicity) 



Prob(Q F < t f ) = H(rt, L—l) = e~ rt ^ 



(Hi 
l\ 



(18) 



Notice that regardless whatever the weight w one assigns to all the P-values, the final 
answer is independent of the weight. This is because t = — lnr = — wlnr F = (— In r F )/r 
and therefore rt = 1ii(1/t>). This results in 



L-l r 

Piob{Q F <T F )=T F J2- 



ln(l/r F )]' 



1=0 



l\ 



(19) 



exactly what one anticipates from ([5]). To obtain the results of Good, one simply makes 
m = L and rik = 1 V k, implying all gi = 0. In this case, (1161) becomes (with r; = l/wi, 



e _t = t g and F(x, 0) = 1 V x) 
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prob(g G < t g ) = 53 J] 



l/lil; 



reproducing exactly ([5]) 

One may also re-express eq. (|16p in a slightly different form 



F(t) 



n 

U=l 



m njj — 1 ^ 

E E ^+r ff ( r fc*.flfc) x 



fe=l 9fc =0 fe 

x E 



n 



J2i 9i=«fc-l 



n 



F(T;n 1 ,n 2 ,...,n m ) 



(20) 



(21) 



Note that in the expression (|2T|) . we have isolated an overall multiplying factor and keeps 
explicit the n\<k< m dependence for later convenience. 



5. Cases of nearly degenerate weights 



In our derivation of p^|) . it is explicitly shown in section 2] that the final P- value obtained 
is independent of the weight w that was used to assign to all the individual P-values, 
Pi,P2, ■ ■ ■ ,Pl- It is thus natural to ask, if one starts by weighting each P- value differently, 
upon making the weights closer to one another, will one recover Fisher's formula ([5]) from 
Good's formula ©? By continuity, the answer is expected to be affirmative. More gener- 
ally, one would like to have a formal protocol to compute the combined P-value when the 
weights may be categorized into several subsets, within each subset the weights are almost 
degenerate. 

In this section, we first illustrate the transition from Good's formula to Fisher's formula 
by combining two almost degenerate P-values. We will then provide a general protocol to 
explicitly, when there exist nearly degenerate w eights, deal with the possible numeri cal in- 
stabi l ity that was first cautioned b y Good! (1955) and subsequently by many authors (Bhoj 
19921: lOlkin and Sanerl liOQll IHouL mm . 



Let us consider combining pi and p 2 with weights w\ and W2 using Good's formula. One 



has 



Prob(Q G < t g ) = 



Wl - w 2 



wipip 2 



W 2 Pi P2 



(22) 



Without loss of generality, one assumes w\ > w 2 and hence writes w\/w 2 = 1 + e with e > 0. 
We are interested in the case when the weights get close to each other, or when e — > 0. We 
now rewrite eq. (122|) as 



Prob(Q G < t ) = 



w 2 



Wl - w 2 



PlP 2 

w 2 



Pi Pi 



(l + e)PiP 2 1H 



Pi P2 



(23) 
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In the limit of small e, we may rewrite (|23p as 

(1 +e) e -TT7 lnP2 _ g^nPi 

= ^[e-e^ft + toftJ + O^)] 

= [1 - ln(pif> 2 ) + 0(e)] 
► Pip 2 [1 - Hp lP2 )} = Prob(Q f < t f ) . (24) 

Note that when the small weight difference w\ — u>2 is near the machine precision of a digital 
computer, using formula ((B)) directly will inevitably introduce numerical instability caused 
by rounding errors. 

To construct a general protocol to deal with nearly degenerate weights, one first observes 
from eqs. (|13ti21[) that it is the inverse of weights = \]w% that permeates the derivation 
of the unified P- value. The closeness between weights is thus naturally defined by closeness 
in the inverse weights. As shown in eqs. @ and ([5]), the combined P- value by Good's 
formula is independent of the absolute size of the weights but only on the relative weights. 
Making the observation that t in eq. (|16D only depend on the ratios rj^k/rk, one also sees 
explicitly that the most general combined P- values (see (fl6)) ) only depend on the relative 
weights as well. We are thus free to choose any scale we wish. For simplicity, we normalize 
the inverse weight associated with each method by demanding the sum of inverse weights 
equal the total number of methods 

M M 

where 1 /rj represents the weight associated method j and M represents the total number of 
P- values (or methods) to be combined. For the GC described in section 01 M = J2T=i n k- 
This normalization choice makes the average inverse weight of participating methods be 1. 

The next step is to determine, for a given list of inverse weights and the radius of 
clustering, the number of clusters needed. This task may be achieved in a hierarchical 
manner. After normalizing the inverse weights using eq. (I25[) . one may sort the inverse 
weights in either ascending or descending order. For a given radius r\ > 0, one starts to 
seek the pair of inverse weights that are closest but not identical, and check if it is smaller 
than the radius 7/. If yes, one will merge that pair of inverse weights by using their average, 
weighted by number of occurrences, as the new center and continue the process till every 
inverse weights in the list is separated by a distance farther than 77. We use an example of 
M = 8 to illustrate the idea. Let the normalized inverse weights {rj}® =1 be 

0.50, 0.70(2), 0.71, 0.74, 1.03,1.80,1.82 

where the number 2 inside the pair of parentheses after 0.70 simply indicates that there 
are two identical inverse weights 0.70 to start with. Assume that one chooses the radius 
of cluster 77 to be 0.005, since every adjacent inverse weights are separated by more than 
0.005, no further clustering procedures is needed and one ends up having seven effective 
clusters: one cluster with two identical inverse weights 0.70, and the rest of six clusters are 
all singletons. This corresponds to m = 7, ni = 1, ri2 = 2, 723 = 714 = • • • = = 1. 



Prob(Q G < T B ) 



P1P2 



(1 



IP2 



Pi 



P1P2 
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Suppose one chooses the clustering radius 77 to be 0.05. In the first step, we identify that 
0.70 and 0.71 are the closet pair of inverse weights. The weighted average between them is 



2-0.70 + 0.71 2.11 



0.703 



The list of inverse weights then appears as 

0.50, 0.703(3), 0.74, 1.03,1.80,1.82. 

The closest pair of inverse weights is now between 1.80 and 1.82, and upon merging them 
we have the list now appears as 

0.50, 0.703(3), 0.74, 1.03,1.81(2). 

Next pair of closest inverse weights is then 0.703 and 0.74. The weighted average leads to 
(2.11 + 0.74)/4 = 0.7125. After this step, the list of inverse weights appears as 

0.50, 0.7125(4), 1.03 ,1.81(2) , 

indicating that we have m = 4 ( four clusters), with number of members being n\ = 1, 
ri2 = 4, 77,3 = 1 and 7x4 = 2. The centers of the four clusters are specified by the average 
inverse weights: 0.50, 0.7125, 1.03 , 1.81. The distance between any two of the average 
inverse weights is now larger than 0.05. 

This is a good place for us to introduce some notation. We shall denote by + r]k-j the 
jth inverse weights of cluster fc, whose averaged inverse weight is r^. With this definition, for 
the example above, we have r)x-,i = 0, 77 2; i = 7/ 2; 2 = —0.0125, rj2-z — —0.0025, rj2 t i = 0.0275, 
7 ?3;i = 0, ?74 ; i = —0.01, and rji-p = 0.01. 

Using the hierarchical protocol mentioned above, the number of clusters m and the num- 
bers of members rife associated with cluster k are all obtained along with {rjk-j}- Following 
the derivation in sectional we obtain a probability density function very similar to ([TJ 



f(r) = 



i=U=l 



(0 



j m 

r£ e ~it(q+i) J| 
^ fc=l 



n 



1 



q + i(r k + r]k;j) 



(26) 



From section |4j we see that the ill-conditioned situations emerge when some weights 
are nearly degenerate and the source of difference in inverse weights comes from obtaining 
F(t; 7ii, 77,2, • ■ • j n m ) in (|21l) from /(r; Ui, TI2, • ■ • , n m ) in (| 14[) . Therefore, one may leave the 



prefactor \J[™ =1 U]U(ri + Vl;j) 
eq. (|26l) . To proceed, we write 

1 



untouched and focus on the rest of the right hand side of 



1 



q + i(rk + Vk-j) 



q + irk 
1 



q + irk 



1 



-In 1+-^ 



in 



exp 



-1 % i: 



E 1 

^ 9 \q + ir k 



q + irk 

9 



Consequently, we may write 
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where 



(28) 



The product in eq. (|26[) may now be formally written as 



n 

k=l 



n - 



If ± 



exp 



Ew XI ( g ,■< 

,g=l fe=l ^ B / 



(29) 



We now note a simplification by choosing r k to be the average inverse weight of the fcth 
cluster. In this case, we have Y^=i Vk-j = V k. That is, Y^i = always. This allows us 
to write eq. (|29l) as 



n 

k=l 



n 



l[ ? + «( r fe + Vk;j) 



n 

Lfe=l 



1 



(<7 + ir fc ) n, = 



exp 



Ls=2 fc=l 



(g + zr fc ) 



(30) 



The key idea here is to Taylor expand the exponential and collect terms of equal number 
of l/(q+ir). Evidently, the first correction term starts with l/(q + ir) 2 . Furthermore, before 
the 1/ (q+ir) 4 order, there is no mixing between different clusters. Below, we rewrite eq. (|26p 
to include the first few orders of correction terms 



f(r) 



f?£ p -it(l+i) 



exp 



(q+ir k ) g 



2n~ IIfc=l (« + **•*)"* 

m 

/(r; {n,}^) + £ F fc;2 />; {n /#fe , n k + 2}) 

(^;2) 2 



fc=l 



fe=i 



fe=i 



2! 



f(T;{ni^ k ,n k + 4}) 



_^ rn 

X r fe;2^';2/>;{^fe,fe',nfc + 2,n fc ,+2}) + 0(ry 5 



2! 

fe,fe'=i 

k^k' 

This immediately leads to 



(31) 



F(r; {n,}]^) + £ F fc;2 F(r; {n^ fcj n fc + 2}) 



fe=l 



k=l 



k=l 



(Yk;2f 

2! 



J F , (r;{n; #fc ,n fc + 4}) 



E y fe;2^c';2^(T;{n^ fc:fe ',n fc + 2,n fe '+2}) + 0(?7 5 



(32) 



k,k'=l 
k^k' 



Note that when the cluster radius r\ is chosen to be zero, the only clusters are from 
sets of identical weights, and all r\ k ._j must be zero. In this case, only the first term on the 
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right hand side of (13"2l exists and the result derived in section 0] is recovered exactly. Since 
all F are finite positive quantities, the errors resulting from truncating the expression in 
eq. (I32p at certain order of 77 can be easily bounded. Therefore, any desired precision may 
be obtained via including the corresponding number of higher order terms. As the main 
result of the current paper, our expansion provides a systematic, numerically stable method 
to achieve desired accuracy in computing combined P- values. 



6. Examples 

Example (a): This example, assuming m = 4, demonstrates how to compute the F(t; {ni}) 
function present in eq. (|2"TT) . Let be the inverse weights associated with cluster k. When 
combining multiple P-values with a prescribed clustering radius on the inverse weights, see 
(f2"S")) and the previously described clustering procedure, the rfcS and the deviations rjk-j are 
obtained once and for all. The r/k-.j, through eq. (I28[) . constitute the key expansion pa- 
rameters Yk-g that yield, upon multiplying by P(r;n;) with different {n{\, the higher order 
terms in our key result (|32p . Note that in eq. (|32p . in the zeroth order term, the argument 
ni of F represents the number of members associated with cluster I. However, for higher 
order correction terms, the n;s entering F no longer carry the same meaning. Therefore, in 
the example shown here, one does not assume that nj is the number of methods associated 
with cluster k. We now illustrate how to open up the sum in eq. (I21I) . The constraint 
Y] { gi = 7ifc — 1 implies that one only has m — 1 independent giS. Once m — 1 giS are 
specified, the remaining one is also determined. To simplify the exposition, let us introduce 
the following notation 

a(g .. j k ) = + (-1)" 

One then expand the sum in (|21[) as 

ni - 1 tt( . \ ni —l— gi m- 1— g i— gi 

^( T ) = E 111 E "(525 2,1) £ a( 53 ;3,l) a(. 94 ;4,l) 

31=0 r l 32=0, 33=0 

»2 - 1 jjl . \ "2 - 1- 32 "2- 1-3 2 -31 

+ E 9 2 2-H E «(5i;l>2) "(53! 3, 2) a( 54 ;4,2) 

32=0 r 2 ffl =0 33=0 

n 3 — 1 „/ s "3-1—33 "3-1-33-31 

+ E 3 3 3+i g E «(3i;l>3) £ a( 52 ;2,3) a( 54 ;4,3) 

33=0 r 3 gi =0 32=0 

"4- 1 „/ , \ "4 — 1— 94 "4-1-34-31 

+ E 3^1 E «(5i;l>4) £ a( 52 ;2,4) a( ff3 ;3,4). (33) 

ff 4=0 r 4 g 1= 32=0 



Example (b): This example illustrates the possibility of numerical instability associated 
with eqs. ([6]) and (j2"Tj) when they are used to combine P-values with nearly equal weights. We 
also show how such instabilities are resolved by using eq. (|32l) . Consider the case of com- 
bining five P-values, {0.008000257,0.008579261,0.0008911761,0.006967988,0.004973110}, 
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weighted respectively by {0.54531152, 0.54532057, 0.54531221, 0.54531399, 0.54531776}. Us- 
ing eq. (|2|), one obtains tq — 4.30656196 x 10~ 7 . The combined P- value is then obtained 
as the probability of attaining a random variable Qg, defined in eq. (|4]), such that is less 
than or equal to tq. 

Combining P- values using eq. ([B]) gives 

Prob(Q G < tq) = 1923475672.53812003+ 134195847.49348195 

-3271698577.16100168+ 1726093852.57087326- 512066795.44147670 
= -0.00000322 . 

When one uses equation ([2"Tj) . r takes the value of tq and the random variable Q is simply 
Qg, and the combined P- value becomes 

Prob(Q < r) = 170090507.09336647+21761086.68190728 

-972903041.25101399 + 941269625.31004059 - 512066795.44252247 
= -0.00000006 . 

Apparently, probability can't be negative and the negative values shown above illustrate 
how eqs. © and (|2ip may suffer from numerical instability when the weights are nearly 
degenerate. This numerical instability is removed by applying equation (|32[) which combines 
weighted P- values using a controlled expansion and yields, for this example, 

Prob(Q < r) = 5.379093 x 10~ 8 + 1.407305 x 10" 16 

-1.066323 x 10~ 21 + 1.634917 x 10~ 25 + 0(1O~ 29 ) 
= 5.37909 x 10~ 8 



Example (c): One natural question to ask is that how well does eq. (j32|) work when one 
chooses a larger clustering radius and group weights that are clearly distinguishable into 
a few clusters? To consider this case, let us use the five P-values from example (b) above 
but with weights chosen differently. Let us assume that the inverse weights (r^ = 1/wk) 
associated with these five P-values are {0.6,0.65,1.2,1.25,1.3}. For this case, r = tq = 
1.935663 x 10~ 13 . Combining P-value using formulas (j6]) yields 

Prob(Q < t) = 2.187324 x 10~ 6 - 5.946040 x 10~ 7 + 2.131226 x 10" 13 
-8.011644 x 10~ 14 + 7.639290 x 10~ 15 
= 1.59272 x 10~ 6 , 

while combining P-values using (|2"Tj) yields identical results 

Prob(Q < r) = 1.725699 x 10~ 6 - 3.049251 x 10~ 7 + 1.311524 x 10 -13 
-6.162803 x 10~ 14 + 7.639290 x 10~ 15 
= 1.59272 x 10~ 6 . 



When one uses t\ = 0.1 as the clustering radius, one obtains two clusters: one with 
average inverse weight 0.625 and the other with average inverse weight 1.25. If one then 



Combining independent P -values 1 3 



uses eq. ([521) to combine P-values, one attains the following results 

Prob(Q < t) = 1.472453 x 1CT 6 + 1.171521 x 1(T 7 + 

+2.584710 x 1(T 9 + 4.889899 x 1CT 10 + 0(1CT 12 ) 
= 1.59268 x 1(T 6 , (34) 

which contains no sign alternation and agrees well with the results from both ([6]) and (j2Tj) . 
This illustrates the robustness of eq. (j32|) in combining P-values. Note that the third term 
on the right hand side of (|54")l is zero. This is because the multiplying factor Yfc ; 3 is zero 
for both clusters. In general, Yfc : 3 measures the skewness of inverse weights associated with 
cluster k and for our case here both cluster of inverse weights are perfectly symmetrical 
with respect to their centers, leading to zero skewness. If the inverse weights of cluster k 
distribute perfectly symmetrically with respect to its center, it is evident from eq. (|28[) that 
Yk-g = for odd g. 

Evidently if one chooses a large clustering radius rj and then uses eq. (|32p to combine P- 
values, many higher order terms in the expansion will be required to achieve high accuracy 
in the final combined P-value. 



7. Future directions 



Although the expression ([IT))) provides access to exact statistics for a broader domain of 
problems and our expansion formula (|3"2"|) provides accurate and stable statistics even when 
nearly degenerate weights are present, there remain a few unanswered questions that should 
be addressed by the community in the near future. For example, even though we can 
accommodate any reasonable P-value weighting, thanks to (|32j) . the more difficult ques- 
tion is how does one choos e the right set of we i ghts when combining statistical signifi- 
cance (IZelen and Joel , 1959; iKoziol and Perlmanl Il978t iHedeesl Il985t IPepe and Fleming] . 
1989; iForrestl l200li The weights chosen reflects how much does one wish to trust var- 
ious obtained P-values. Ideally, a fully systematic method should also provide a metric 
for choosing appropriate weights. How to obtain the best set of weights remains an open 
problem and definitely deserves further investigations. 

Another limitation of (|16l) and (|3"2")l . and consequently of Fisher's and Good's formulas, 
is that one must assume the P-values to be combined as independent. In real applications, 
it is foreseeable that P-values reported by v arious methods may exhibit non- negligible 



correlations. How to obtain the correlation ( Wei and LachinL 1984 : Pocock et al 



1987 



Jamesl. 1991) and properly take into account t he existence of P-value correlations ( Brown . 
1975 1 iKost and McDermottl . l2002t iHoul . 120051 ) is also a challenging problem that we hope 
to address in the near future. 
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